Simple and Optimal Methods for Stochastic Variational Inequalities, II: Markovian Noise and Policy Evaluation in Reinforcement Learning

نویسندگان

چکیده

The focus of this paper is on stochastic variational inequalities (VI) under Markovian noise. A prominent application our algorithmic developments the policy evaluation problem in reinforcement learning. Prior investigations literature focused temporal difference (TD) learning by employing nonsmooth finite time analysis motivated subgradient descent leading to certain limitations. These limitations encompass requirement analyzing a modified TD algorithm that involves projection an priori defined Euclidean ball, achieving nonoptimal convergence rate and no clear way deriving beneficial effects parallel implementation. Our approach remedies these shortcomings broader context VIs particular when it comes evaluation. We developed variety simple type algorithms its original version maintain simplicity, while offering distinct advantages from nonasymptotic point view. first provide improved standard can benefit Then we present versions conditional (CTD), periodic updates iterates, which reduce bias therefore exhibit iteration complexity. This brings us fast (FTD) combines elements CTD operator extrapolation method companion paper. For novel index resetting step size FTD exhibits best known rate. also devised robust particularly suitable for discounting factors close 1.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Iterative Methods for Equilibrium Problems, Variational Inequalities and Fixed Points

متن کامل

Variational methods for Reinforcement Learning

We consider reinforcement learning as solving a Markov decision process with unknown transition distribution. Based on interaction with the environment, an estimate of the transition matrix is obtained from which the optimal decision policy is formed. The classical maximum likelihood point estimate of the transition model does not reflect the uncertainty in the estimate of the transition model ...

متن کامل

Strong convergence for variational inequalities and equilibrium problems and representations

We introduce an implicit method for nding a common element of the set of solutions of systems of equilibrium problems and the set of common xed points of a sequence of nonexpansive mappings and a representation of nonexpansive mappings. Then we prove the strong convergence of the proposed implicit schemes to the unique solution of a variational inequality, which is the optimality condition for ...

متن کامل

designing and validating a textbook evaluation questionnaire for reading comprehension ii and exploring its relationship with achievement

در هر برنامه آموزشی، مهم ترین فاکتور موثر بر موفقیت دانش آموزان کتاب درسی است (مک دونو و شاو 2003). در حقیقت ، کتاب قلب آموزش زبان انگلیسی است( شلدن 1988). به دلیل اهمیت والای کتاب به عنوان عنصر ضروری کلاس های آموزش زبان ، کتب باید به دقت ارزیابی و انتخاب شده تا از هرگونه تاثیر منفی بر دانش آموزان جلوگیری شود( لیتز). این تحقیق با طراحی پرسش نامه ارزیابی کتاب که فرصت ارزیابی معتبر را به اساتید د...

15 صفحه اول

Optimal policy switching algorithms for reinforcement learning

We address the problem of single-agent, autonomous sequential decision making. We assume that some controllers or behavior policies are given as prior knowledge, and the task of the agent is to learn how to switch between these policies. We formulate the problem using the framework of reinforcement learning and options (Sutton, Precup & Singh, 1999; Precup, 2000). We derive gradient-based algor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Siam Journal on Optimization

سال: 2022

ISSN: ['1095-7189', '1052-6234']

DOI: https://doi.org/10.1137/20m1381691